Get Data

This is nearly entirely based on the code in notebook 09 and that in 11.

We have latent variable expression analysis data - Latent Variable Table - Latent Variables selected by Random Forest

For this data we are also using any data for which there are gene variants (cNFs, pNFs, MPNSTs): - Exome-Seq variants - WGS Variants

Lastly we need to filter by genes that are expressed to avoid getting too many un-qualifying variants. - RNA-Seq Data

Let’s see if there are any LVs that split based on gene variant. Because we’re having trouble scaling with the number of latent variables, I only look at variants that occur in less than 5% of the population. notice this is a difference from notebook #11.

wgs.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20551862")$asDataFrame()
## 
Building the CSV... [###-----------------]16.00%   227123/1419406       
Building the CSV... [#####---------------]24.59%   349035/1419406       
Building the CSV... [########------------]41.79%   593200/1419406       
Create CSV FileHandle [##########----------]50.00%   709707/1419406       
Create CSV FileHandle [####################]100.00%   1419406/1419406   Done...    
Downloading  [#-------------------]3.11%   2.0MB/64.3MB (673.0kB/s) Job-10385265531005259675571778.csv     
Downloading  [#-------------------]6.22%   4.0MB/64.3MB (720.2kB/s) Job-10385265531005259675571778.csv     
Downloading  [##------------------]9.33%   6.0MB/64.3MB (755.7kB/s) Job-10385265531005259675571778.csv     
Downloading  [##------------------]12.44%   8.0MB/64.3MB (825.8kB/s) Job-10385265531005259675571778.csv     
Downloading  [###-----------------]15.55%   10.0MB/64.3MB (920.7kB/s) Job-10385265531005259675571778.csv     
Downloading  [####----------------]18.66%   12.0MB/64.3MB (1011.1kB/s) Job-10385265531005259675571778.csv     
Downloading  [####----------------]21.77%   14.0MB/64.3MB (1.1MB/s) Job-10385265531005259675571778.csv     
Downloading  [#####---------------]24.88%   16.0MB/64.3MB (1.2MB/s) Job-10385265531005259675571778.csv     
Downloading  [######--------------]27.99%   18.0MB/64.3MB (1.3MB/s) Job-10385265531005259675571778.csv     
Downloading  [######--------------]31.10%   20.0MB/64.3MB (1.4MB/s) Job-10385265531005259675571778.csv     
Downloading  [#######-------------]34.21%   22.0MB/64.3MB (1.5MB/s) Job-10385265531005259675571778.csv     
Downloading  [#######-------------]37.32%   24.0MB/64.3MB (1.6MB/s) Job-10385265531005259675571778.csv     
Downloading  [########------------]40.43%   26.0MB/64.3MB (1.7MB/s) Job-10385265531005259675571778.csv     
Downloading  [#########-----------]43.54%   28.0MB/64.3MB (1.8MB/s) Job-10385265531005259675571778.csv     
Downloading  [#########-----------]46.65%   30.0MB/64.3MB (1.9MB/s) Job-10385265531005259675571778.csv     
Downloading  [##########----------]49.76%   32.0MB/64.3MB (2.0MB/s) Job-10385265531005259675571778.csv     
Downloading  [###########---------]52.88%   34.0MB/64.3MB (2.0MB/s) Job-10385265531005259675571778.csv     
Downloading  [###########---------]55.99%   36.0MB/64.3MB (2.1MB/s) Job-10385265531005259675571778.csv     
Downloading  [############--------]59.10%   38.0MB/64.3MB (2.2MB/s) Job-10385265531005259675571778.csv     
Downloading  [############--------]62.21%   40.0MB/64.3MB (2.3MB/s) Job-10385265531005259675571778.csv     
Downloading  [#############-------]65.32%   42.0MB/64.3MB (2.4MB/s) Job-10385265531005259675571778.csv     
Downloading  [##############------]68.43%   44.0MB/64.3MB (2.4MB/s) Job-10385265531005259675571778.csv     
Downloading  [##############------]71.54%   46.0MB/64.3MB (2.5MB/s) Job-10385265531005259675571778.csv     
Downloading  [###############-----]74.65%   48.0MB/64.3MB (2.6MB/s) Job-10385265531005259675571778.csv     
Downloading  [################----]77.76%   50.0MB/64.3MB (2.6MB/s) Job-10385265531005259675571778.csv     
Downloading  [################----]80.87%   52.0MB/64.3MB (2.7MB/s) Job-10385265531005259675571778.csv     
Downloading  [#################---]83.98%   54.0MB/64.3MB (2.7MB/s) Job-10385265531005259675571778.csv     
Downloading  [#################---]87.09%   56.0MB/64.3MB (2.8MB/s) Job-10385265531005259675571778.csv     
Downloading  [##################--]90.20%   58.0MB/64.3MB (2.9MB/s) Job-10385265531005259675571778.csv     
Downloading  [###################-]93.31%   60.0MB/64.3MB (2.9MB/s) Job-10385265531005259675571778.csv     
Downloading  [###################-]96.42%   62.0MB/64.3MB (3.0MB/s) Job-10385265531005259675571778.csv     
Downloading  [####################]99.53%   64.0MB/64.3MB (3.1MB/s) Job-10385265531005259675571778.csv     
Downloading  [####################]100.00%   64.3MB/64.3MB (3.1MB/s) Job-10385265531005259675571778.csv Done...
exome.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20554939")$asDataFrame()
## 
Building the CSV... [##------------------]12.50%   239535/1916686       
Building the CSV... [####----------------]19.34%   370774/1916686       
Building the CSV... [######--------------]31.84%   610183/1916686       
Building the CSV... [########------------]38.04%   729149/1916686       
Create CSV FileHandle [##########----------]50.00%   958348/1916686       
Create CSV FileHandle [####################]100.00%   1916686/1916686   Done...    
Downloading  [--------------------]2.42%   2.0MB/82.5MB (1.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [#-------------------]4.85%   4.0MB/82.5MB (2.1MB/s) Job-103852667844842091230037446.csv     
Downloading  [#-------------------]7.27%   6.0MB/82.5MB (2.4MB/s) Job-103852667844842091230037446.csv     
Downloading  [##------------------]9.69%   8.0MB/82.5MB (2.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [##------------------]12.11%   10.0MB/82.5MB (3.0MB/s) Job-103852667844842091230037446.csv     
Downloading  [###-----------------]14.54%   12.0MB/82.5MB (3.2MB/s) Job-103852667844842091230037446.csv     
Downloading  [###-----------------]16.96%   14.0MB/82.5MB (3.4MB/s) Job-103852667844842091230037446.csv     
Downloading  [####----------------]19.38%   16.0MB/82.5MB (3.6MB/s) Job-103852667844842091230037446.csv     
Downloading  [####----------------]21.81%   18.0MB/82.5MB (3.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [#####---------------]24.23%   20.0MB/82.5MB (3.9MB/s) Job-103852667844842091230037446.csv     
Downloading  [#####---------------]26.65%   22.0MB/82.5MB (4.0MB/s) Job-103852667844842091230037446.csv     
Downloading  [######--------------]29.07%   24.0MB/82.5MB (4.0MB/s) Job-103852667844842091230037446.csv     
Downloading  [######--------------]31.50%   26.0MB/82.5MB (4.1MB/s) Job-103852667844842091230037446.csv     
Downloading  [#######-------------]33.92%   28.0MB/82.5MB (4.1MB/s) Job-103852667844842091230037446.csv     
Downloading  [#######-------------]36.34%   30.0MB/82.5MB (4.2MB/s) Job-103852667844842091230037446.csv     
Downloading  [########------------]38.77%   32.0MB/82.5MB (4.3MB/s) Job-103852667844842091230037446.csv     
Downloading  [########------------]41.19%   34.0MB/82.5MB (4.3MB/s) Job-103852667844842091230037446.csv     
Downloading  [#########-----------]43.61%   36.0MB/82.5MB (4.4MB/s) Job-103852667844842091230037446.csv     
Downloading  [#########-----------]46.03%   38.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv     
Downloading  [##########----------]48.46%   40.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv     
Downloading  [##########----------]50.88%   42.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv     
Downloading  [###########---------]53.30%   44.0MB/82.5MB (4.5MB/s) Job-103852667844842091230037446.csv     
Downloading  [###########---------]55.73%   46.0MB/82.5MB (4.6MB/s) Job-103852667844842091230037446.csv     
Downloading  [############--------]58.15%   48.0MB/82.5MB (4.6MB/s) Job-103852667844842091230037446.csv     
Downloading  [############--------]60.57%   50.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [#############-------]62.99%   52.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [#############-------]65.42%   54.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [##############------]67.84%   56.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [##############------]70.26%   58.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [###############-----]72.69%   60.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [###############-----]75.11%   62.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [################----]77.53%   64.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [################----]79.95%   66.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [################----]82.38%   68.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [#################---]84.80%   70.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv     
Downloading  [#################---]87.22%   72.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv     
Downloading  [##################--]89.65%   74.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv     
Downloading  [##################--]92.07%   76.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv     
Downloading  [###################-]94.49%   78.0MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv     
Downloading  [###################-]96.91%   80.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [####################]99.34%   82.0MB/82.5MB (4.7MB/s) Job-103852667844842091230037446.csv     
Downloading  [####################]100.00%   82.5MB/82.5MB (4.8MB/s) Job-103852667844842091230037446.csv Done...
all.vars<-rbind(select(wgs.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'),
    select(exome.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'))%>%
  subset(gnomAD_AF<0.01)

tabids<-synTableQuery('select distinct tableId from syn21221980')$asDataFrame()

vars="specimenID,individualID,Symbol,totalCounts,zScore,tumorType,nf1Genotype,sex"

full.tab<-do.call(rbind,lapply(tabids$tableId,function(x) synTableQuery(paste('select',vars,'from',x))$asDataFrame()))
## 
Building the CSV... [##------------------]9.28%   100326/1080532       
Building the CSV... [######--------------]29.74%   321315/1080532       
Building the CSV... [########------------]39.98%   431959/1080532       
Create CSV FileHandle [##########----------]50.11%   541468/1080532       
Create CSV FileHandle [####################]100.00%   1080532/1080532   Done...    
Downloading  [#-------------------]3.14%   2.0MB/63.6MB (2.2MB/s) Job-103852671253496140225452931.csv     
Downloading  [#-------------------]6.29%   4.0MB/63.6MB (2.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [##------------------]9.43%   6.0MB/63.6MB (3.0MB/s) Job-103852671253496140225452931.csv     
Downloading  [###-----------------]12.58%   8.0MB/63.6MB (3.1MB/s) Job-103852671253496140225452931.csv     
Downloading  [###-----------------]15.72%   10.0MB/63.6MB (3.3MB/s) Job-103852671253496140225452931.csv     
Downloading  [####----------------]18.86%   12.0MB/63.6MB (3.4MB/s) Job-103852671253496140225452931.csv     
Downloading  [####----------------]22.01%   14.0MB/63.6MB (3.4MB/s) Job-103852671253496140225452931.csv     
Downloading  [#####---------------]25.15%   16.0MB/63.6MB (3.6MB/s) Job-103852671253496140225452931.csv     
Downloading  [######--------------]28.30%   18.0MB/63.6MB (3.6MB/s) Job-103852671253496140225452931.csv     
Downloading  [######--------------]31.44%   20.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [#######-------------]34.58%   22.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [########------------]37.73%   24.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [########------------]40.87%   26.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [#########-----------]44.02%   28.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [#########-----------]47.16%   30.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [##########----------]50.30%   32.0MB/63.6MB (3.9MB/s) Job-103852671253496140225452931.csv     
Downloading  [###########---------]53.45%   34.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [###########---------]56.59%   36.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [############--------]59.74%   38.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [#############-------]62.88%   40.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [#############-------]66.02%   42.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [##############------]69.17%   44.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [##############------]72.31%   46.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [###############-----]75.46%   48.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [################----]78.60%   50.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [################----]81.74%   52.0MB/63.6MB (3.7MB/s) Job-103852671253496140225452931.csv     
Downloading  [#################---]84.89%   54.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [##################--]88.03%   56.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [##################--]91.18%   58.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [###################-]94.32%   60.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [###################-]97.46%   62.0MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv     
Downloading  [####################]100.00%   63.6MB/63.6MB (3.8MB/s) Job-103852671253496140225452931.csv Done...    
Building the CSV... [#####---------------]23.00%   88077/382920       
Building the CSV... [####################]100.00%   382920/382920   Done...    
Downloading  [##------------------]8.72%   2.0MB/22.9MB (2.2MB/s) Job-103852685473184612291667933.csv     
Downloading  [###-----------------]17.44%   4.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv     
Downloading  [#####---------------]26.15%   6.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv     
Downloading  [#######-------------]34.87%   8.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv     
Downloading  [#########-----------]43.59%   10.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv     
Downloading  [##########----------]52.31%   12.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv     
Downloading  [############--------]61.02%   14.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv     
Downloading  [##############------]69.74%   16.0MB/22.9MB (2.5MB/s) Job-103852685473184612291667933.csv     
Downloading  [################----]78.46%   18.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv     
Downloading  [#################---]87.18%   20.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv     
Downloading  [###################-]95.90%   22.0MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv     
Downloading  [####################]100.00%   22.9MB/22.9MB (2.6MB/s) Job-103852685473184612291667933.csv Done...    
Building the CSV... [###-----------------]17.31%   218130/1260468       
Building the CSV... [#####---------------]25.83%   325589/1260468       
Building the CSV... [#########-----------]43.66%   550263/1260468       
Create CSV FileHandle [##########----------]50.05%   630920/1260468       
Create CSV FileHandle [####################]100.00%   1260468/1260468   Done...    
Downloading  [#-------------------]2.77%   2.0MB/72.2MB (851.7kB/s) Job-103852695086556061431476080.csv     
Downloading  [#-------------------]5.54%   4.0MB/72.2MB (905.9kB/s) Job-103852695086556061431476080.csv     
Downloading  [##------------------]8.31%   6.0MB/72.2MB (639.3kB/s) Job-103852695086556061431476080.csv     
Downloading  [##------------------]11.08%   8.0MB/72.2MB (657.9kB/s) Job-103852695086556061431476080.csv     
Downloading  [###-----------------]13.85%   10.0MB/72.2MB (683.7kB/s) Job-103852695086556061431476080.csv     
Downloading  [###-----------------]16.62%   12.0MB/72.2MB (662.0kB/s) Job-103852695086556061431476080.csv     
Downloading  [####----------------]19.39%   14.0MB/72.2MB (664.8kB/s) Job-103852695086556061431476080.csv     
Downloading  [####----------------]22.17%   16.0MB/72.2MB (688.8kB/s) Job-103852695086556061431476080.csv     
Downloading  [#####---------------]24.94%   18.0MB/72.2MB (718.6kB/s) Job-103852695086556061431476080.csv     
Downloading  [######--------------]27.71%   20.0MB/72.2MB (761.7kB/s) Job-103852695086556061431476080.csv     
Downloading  [######--------------]30.48%   22.0MB/72.2MB (807.7kB/s) Job-103852695086556061431476080.csv     
Downloading  [#######-------------]33.25%   24.0MB/72.2MB (861.0kB/s) Job-103852695086556061431476080.csv     
Downloading  [#######-------------]36.02%   26.0MB/72.2MB (909.9kB/s) Job-103852695086556061431476080.csv     
Downloading  [########------------]38.79%   28.0MB/72.2MB (957.1kB/s) Job-103852695086556061431476080.csv     
Downloading  [########------------]41.56%   30.0MB/72.2MB (1007.1kB/s) Job-103852695086556061431476080.csv     
Downloading  [#########-----------]44.33%   32.0MB/72.2MB (1.0MB/s) Job-103852695086556061431476080.csv     
Downloading  [#########-----------]47.10%   34.0MB/72.2MB (1.1MB/s) Job-103852695086556061431476080.csv     
Downloading  [##########----------]49.87%   36.0MB/72.2MB (1.1MB/s) Job-103852695086556061431476080.csv     
Downloading  [###########---------]52.64%   38.0MB/72.2MB (1.2MB/s) Job-103852695086556061431476080.csv     
Downloading  [###########---------]55.41%   40.0MB/72.2MB (1.2MB/s) Job-103852695086556061431476080.csv     
Downloading  [############--------]58.18%   42.0MB/72.2MB (1.3MB/s) Job-103852695086556061431476080.csv     
Downloading  [############--------]60.95%   44.0MB/72.2MB (1.3MB/s) Job-103852695086556061431476080.csv     
Downloading  [#############-------]63.72%   46.0MB/72.2MB (1.4MB/s) Job-103852695086556061431476080.csv     
Downloading  [#############-------]66.50%   48.0MB/72.2MB (1.4MB/s) Job-103852695086556061431476080.csv     
Downloading  [##############------]69.27%   50.0MB/72.2MB (1.5MB/s) Job-103852695086556061431476080.csv     
Downloading  [##############------]72.04%   52.0MB/72.2MB (1.5MB/s) Job-103852695086556061431476080.csv     
Downloading  [###############-----]74.81%   54.0MB/72.2MB (1.6MB/s) Job-103852695086556061431476080.csv     
Downloading  [################----]77.58%   56.0MB/72.2MB (1.6MB/s) Job-103852695086556061431476080.csv     
Downloading  [################----]80.35%   58.0MB/72.2MB (1.7MB/s) Job-103852695086556061431476080.csv     
Downloading  [#################---]83.12%   60.0MB/72.2MB (1.7MB/s) Job-103852695086556061431476080.csv     
Downloading  [#################---]85.89%   62.0MB/72.2MB (1.8MB/s) Job-103852695086556061431476080.csv     
Downloading  [##################--]88.66%   64.0MB/72.2MB (1.8MB/s) Job-103852695086556061431476080.csv     
Downloading  [##################--]91.43%   66.0MB/72.2MB (1.8MB/s) Job-103852695086556061431476080.csv     
Downloading  [###################-]94.20%   68.0MB/72.2MB (1.9MB/s) Job-103852695086556061431476080.csv     
Downloading  [###################-]96.97%   70.0MB/72.2MB (1.9MB/s) Job-103852695086556061431476080.csv     
Downloading  [####################]99.74%   72.0MB/72.2MB (2.0MB/s) Job-103852695086556061431476080.csv     
Downloading  [####################]100.00%   72.2MB/72.2MB (2.0MB/s) Job-103852695086556061431476080.csv Done...    
Building the CSV... [#-------------------]5.81%   106538/1833408       
Building the CSV... [##------------------]12.00%   219980/1833408       
Building the CSV... [#####---------------]24.38%   446932/1833408       
Building the CSV... [######--------------]30.45%   558333/1833408       
Building the CSV... [#########-----------]42.81%   784884/1833408       
Building the CSV... [##########----------]49.13%   900715/1833408       
Create CSV FileHandle [##########----------]50.08%   918136/1833408       
Create CSV FileHandle [####################]100.00%   1833408/1833408   Done...    
Downloading  [--------------------]2.10%   2.0MB/95.3MB (1.3MB/s) Job-103852708615019317247030598.csv     
Downloading  [#-------------------]4.20%   4.0MB/95.3MB (1.4MB/s) Job-103852708615019317247030598.csv     
Downloading  [#-------------------]6.30%   6.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [##------------------]8.40%   8.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [##------------------]10.49%   10.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [###-----------------]12.59%   12.0MB/95.3MB (1.0MB/s) Job-103852708615019317247030598.csv     
Downloading  [###-----------------]14.69%   14.0MB/95.3MB (885.8kB/s) Job-103852708615019317247030598.csv     
Downloading  [###-----------------]16.79%   16.0MB/95.3MB (904.4kB/s) Job-103852708615019317247030598.csv     
Downloading  [####----------------]18.89%   18.0MB/95.3MB (968.0kB/s) Job-103852708615019317247030598.csv     
Downloading  [####----------------]20.99%   20.0MB/95.3MB (1.0MB/s) Job-103852708615019317247030598.csv     
Downloading  [#####---------------]23.09%   22.0MB/95.3MB (1.1MB/s) Job-103852708615019317247030598.csv     
Downloading  [#####---------------]25.19%   24.0MB/95.3MB (1.2MB/s) Job-103852708615019317247030598.csv     
Downloading  [#####---------------]27.28%   26.0MB/95.3MB (1.2MB/s) Job-103852708615019317247030598.csv     
Downloading  [######--------------]29.38%   28.0MB/95.3MB (1.3MB/s) Job-103852708615019317247030598.csv     
Downloading  [######--------------]31.48%   30.0MB/95.3MB (1.4MB/s) Job-103852708615019317247030598.csv     
Downloading  [#######-------------]33.58%   32.0MB/95.3MB (1.4MB/s) Job-103852708615019317247030598.csv     
Downloading  [#######-------------]35.68%   34.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [########------------]37.78%   36.0MB/95.3MB (1.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [########------------]39.88%   38.0MB/95.3MB (1.6MB/s) Job-103852708615019317247030598.csv     
Downloading  [########------------]41.98%   40.0MB/95.3MB (1.7MB/s) Job-103852708615019317247030598.csv     
Downloading  [#########-----------]44.07%   42.0MB/95.3MB (1.7MB/s) Job-103852708615019317247030598.csv     
Downloading  [#########-----------]46.17%   44.0MB/95.3MB (1.8MB/s) Job-103852708615019317247030598.csv     
Downloading  [##########----------]48.27%   46.0MB/95.3MB (1.8MB/s) Job-103852708615019317247030598.csv     
Downloading  [##########----------]50.37%   48.0MB/95.3MB (1.9MB/s) Job-103852708615019317247030598.csv     
Downloading  [##########----------]52.47%   50.0MB/95.3MB (1.9MB/s) Job-103852708615019317247030598.csv     
Downloading  [###########---------]54.57%   52.0MB/95.3MB (2.0MB/s) Job-103852708615019317247030598.csv     
Downloading  [###########---------]56.67%   54.0MB/95.3MB (2.0MB/s) Job-103852708615019317247030598.csv     
Downloading  [############--------]58.77%   56.0MB/95.3MB (2.1MB/s) Job-103852708615019317247030598.csv     
Downloading  [############--------]60.86%   58.0MB/95.3MB (2.1MB/s) Job-103852708615019317247030598.csv     
Downloading  [#############-------]62.96%   60.0MB/95.3MB (2.1MB/s) Job-103852708615019317247030598.csv     
Downloading  [#############-------]65.06%   62.0MB/95.3MB (2.2MB/s) Job-103852708615019317247030598.csv     
Downloading  [#############-------]67.16%   64.0MB/95.3MB (2.2MB/s) Job-103852708615019317247030598.csv     
Downloading  [##############------]69.26%   66.0MB/95.3MB (2.3MB/s) Job-103852708615019317247030598.csv     
Downloading  [##############------]71.36%   68.0MB/95.3MB (2.3MB/s) Job-103852708615019317247030598.csv     
Downloading  [###############-----]73.46%   70.0MB/95.3MB (2.3MB/s) Job-103852708615019317247030598.csv     
Downloading  [###############-----]75.56%   72.0MB/95.3MB (2.4MB/s) Job-103852708615019317247030598.csv     
Downloading  [################----]77.65%   74.0MB/95.3MB (2.4MB/s) Job-103852708615019317247030598.csv     
Downloading  [################----]79.75%   76.0MB/95.3MB (2.4MB/s) Job-103852708615019317247030598.csv     
Downloading  [################----]81.85%   78.0MB/95.3MB (2.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [#################---]83.95%   80.0MB/95.3MB (2.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [#################---]86.05%   82.0MB/95.3MB (2.5MB/s) Job-103852708615019317247030598.csv     
Downloading  [##################--]88.15%   84.0MB/95.3MB (2.6MB/s) Job-103852708615019317247030598.csv     
Downloading  [##################--]90.25%   86.0MB/95.3MB (2.6MB/s) Job-103852708615019317247030598.csv     
Downloading  [##################--]92.35%   88.0MB/95.3MB (2.6MB/s) Job-103852708615019317247030598.csv     
Downloading  [###################-]94.44%   90.0MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv     
Downloading  [###################-]96.54%   92.0MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv     
Downloading  [####################]98.64%   94.0MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv     
Downloading  [####################]100.00%   95.3MB/95.3MB (2.7MB/s) Job-103852708615019317247030598.csv Done...
#lets only get those genes that are expressed in all samples
expr.genes<-full.tab%>%group_by(Symbol)%>%
  summarize(minExpr=min(totalCounts))%>%
  subset(minExpr>0)%>%ungroup()%>%select(Symbol)%>%
  distinct()

top.lvs<-synTableQuery("SELECT * from syn21318452")$asDataFrame()
## 
 [####################]100.00%   1/1   Done...    
Downloading  [####################]100.00%   3.7kB/3.7kB (870.9kB/s) Job-103852711257363491473303588.csv Done...
mp_res<-synTableQuery("SELECT * FROM syn21046991")$asDataFrame()%>%
  filter(isCellLine != "TRUE")%>%
  subset(latent_var%in%top.lvs$LatentVar)%>%
  select(latent_var,id,value,specimenID,tumorType,modelOf,diagnosis)
## 
Building the CSV... [########------------]38.55%   57858/150072       
Building the CSV... [####################]100.00%   150072/150072   Done...    
Downloading  [##------------------]7.96%   2.0MB/25.1MB (2.9MB/s) Job-103852722592734267688670055.csv     
Downloading  [###-----------------]15.92%   4.0MB/25.1MB (4.3MB/s) Job-103852722592734267688670055.csv     
Downloading  [#####---------------]23.88%   6.0MB/25.1MB (5.2MB/s) Job-103852722592734267688670055.csv     
Downloading  [######--------------]31.85%   8.0MB/25.1MB (5.8MB/s) Job-103852722592734267688670055.csv     
Downloading  [########------------]39.81%   10.0MB/25.1MB (6.2MB/s) Job-103852722592734267688670055.csv     
Downloading  [##########----------]47.77%   12.0MB/25.1MB (6.3MB/s) Job-103852722592734267688670055.csv     
Downloading  [###########---------]55.73%   14.0MB/25.1MB (6.7MB/s) Job-103852722592734267688670055.csv     
Downloading  [#############-------]63.69%   16.0MB/25.1MB (7.0MB/s) Job-103852722592734267688670055.csv     
Downloading  [##############------]71.65%   18.0MB/25.1MB (7.1MB/s) Job-103852722592734267688670055.csv     
Downloading  [################----]79.61%   20.0MB/25.1MB (7.1MB/s) Job-103852722592734267688670055.csv     
Downloading  [##################--]87.58%   22.0MB/25.1MB (7.3MB/s) Job-103852722592734267688670055.csv     
Downloading  [###################-]95.54%   24.0MB/25.1MB (7.5MB/s) Job-103852722592734267688670055.csv     
Downloading  [####################]100.00%   25.1MB/25.1MB (7.6MB/s) Job-103852722592734267688670055.csv Done...

Merge data together

For the purposes of this analysis we want to have only those samples wtih genomic data and only those latent variables that are selected by the Random Forest as predictive, and also those variants that are expressed.

expr.vars<-subset(all.vars,Hugo_Symbol%in%expr.genes$Symbol)

samps<-intersect(mp_res$specimenID,expr.vars$specimenID)

mp_res<-mp_res%>%
  subset(specimenID%in%samps)#%>%
#  group_by(latent_var) %>%
#  mutate(sd_value = sd(value)) %>%
#  filter(sd_value > 0.025) %>%
#  ungroup()

Retrieve Variant Data

Let’s retrieve the LV data and summarize how many genes have mutations across samples.

data.with.var<-mp_res%>%
  left_join(expr.vars,by='specimenID')

tab<-data.with.var

top.genes=tab%>%#group_by(tumorType)%>%
  mutate(numSamps=n_distinct(specimenID))%>%
      group_by(Hugo_Symbol)%>%
    mutate(numMutated=n_distinct(specimenID))%>%
    ungroup()%>%
  subset(numMutated>1)%>%
      subset(numMutated<(numSamps-1))%>%
  select(tumorType,Hugo_Symbol,numSamps,numMutated)%>%distinct()

gene.count=top.genes%>%group_by(tumorType)%>%
  mutate(numGenes=n_distinct(Hugo_Symbol))%>%
  mutate(minMutated=min(numMutated))%>%
  mutate(maxMutated=max(numMutated))%>%
  select(tumorType,numGenes,minMutated,maxMutated)%>%distinct()

DT::datatable(gene.count)

## Test significance of each gene/immune population

Now we can loop through every tumor type and gene with a Wilcoxon Rank Sum Test and correct for multiple testing for each LV.

#red.genes<-c("NF1","SUZ12","CDKN2A","EED")##for testing

##first spread the WT/Mutated values
vals<-tab%>%subset(Hugo_Symbol%in%top.genes$Hugo_Symbol)%>%
    mutate(mutated=ifelse(is.na(IMPACT),'WT','Mutated'))%>%
  select(latent_var,tumorType,value,Hugo_Symbol,specimenID,mutated)%>%
  distinct()%>%
  spread(key=Hugo_Symbol,value='mutated',fill='WT')

##double check to make sure there are both mutated and unmutated values
counts<-vals%>%
  gather(key=gene,value=status,-c(latent_var,tumorType,value,specimenID))%>% 
    select(latent_var,tumorType,value,gene,specimenID,status)%>%
    group_by(latent_var,gene)%>%
    mutate(numVals=n_distinct(status))%>%
    mutate(numSamps=n_distinct(specimenID))%>%
    subset(numVals==2)%>%ungroup()

#so now we have only 
with.sig<-counts%>%ungroup()%>%#subset(gene%in%top.genes$Hugo_Symbol)%>%
    group_by(latent_var,gene)%>%
  mutate(pval=wilcox.test(value~status)$p.value)%>%ungroup()%>%
  group_by(latent_var)%>%
  mutate(corP=p.adjust(pval))%>%ungroup()%>%
  select(latent_var,gene,pval,corP)%>%distinct()

sig.vals<-subset(with.sig,corP<0.01)

DT::datatable(sig.vals%>%group_by(latent_var)%>%summarize(numGenes=n_distinct(gene)))

Interesting! Some genes actually pass p-value correction. What do they look like? Here let’s write the messiest possible code to print.

library(nationalparkcolors)

val<-park_palette('Acadia',2)
names(val)<-c('Mutated','WT')


for(ct in unique(sig.vals$latent_var)){
    tplot<-sig.vals[which(sig.vals$latent_var==ct),]
    if(nrow(tplot)==0)
      next
    
    print(ct)
    sigs=tplot%>%rowwise()%>%mutate(vals=paste(gene,format(corP,digits=3),sep=':'))%>%select(vals)%>%unlist()%>%paste(collapse=',')
    print(sigs)
        p<-counts%>%
    subset(latent_var==ct)%>%
    subset(gene%in%tplot$gene)%>%
    ggplot(aes(x=gene,y=value,col=status))+
    geom_boxplot(outlier.shape=NA)+
    geom_point(position=position_jitterdodge(),aes(shape=tumorType,col=status,group=status))+
    theme(axis.text.x = element_text(angle = 90, hjust = 1))+
    theme_bw()+
    ggtitle(paste(ct,'scores\n',sigs))+
          scale_color_manual(values=val)#    if(method=='cibersort')
#      p<-p+scale_y_log10()
    print(p)
  }
## [1] "451,REACTOME_MITOCHONDRIAL_PROTEIN_IMPORT"
## [1] "ANP32B:0.000683,CDC27:0.000346,CTBP2:0.0032,CTDSP2:0.000346,FAM104B:0.000346,GGT1:0.000724,IGSF3:0.000346,SLC25A5:0.000346,ZNF717:0.000346"

## [1] "720,PID_FANCONI_PATHWAY"
## [1] "ANP32B:4.21e-05,CDC27:0.000112,CTBP2:6.67e-05,CTDSP2:0.000112,FAM104B:0.000112,GGT1:8.75e-05,IGSF3:0.000112,SLC25A5:0.000112,ZNF717:0.000112"

## [1] "LV 185"
## [1] "ANP32B:2.45e-05,CDC27:2.49e-06,CTBP2:0.000683,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:5.82e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"

## [1] "LV 308"
## [1] "ANP32B:0.00178,CDC27:0.000167,CTBP2:0.00034,CTDSP2:0.000167,FAM104B:0.000167,IGSF3:0.000167,SLC25A5:0.000167,ZNF717:0.000167"

## [1] "LV 376"
## [1] "ANP32B:0.00424,CDC27:0.000676,CTBP2:0.00945,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.00526,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"

## [1] "LV 384"
## [1] "ANP32B:3.5e-06,CDC27:2.49e-06,CTBP2:0.00239,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:5.82e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"

## [1] "LV 442"
## [1] "ANP32B:0.000487,CDC27:0.000485,CTBP2:0.00131,CTDSP2:0.000485,FAM104B:0.000485,GGT1:0.000528,IGSF3:0.000485,SLC25A5:0.000485,ZNF717:0.000485"

## [1] "LV 445"
## [1] "ANP32B:0.00424,CDC27:0.000485,CTDSP2:0.000485,FAM104B:0.000485,GGT1:0.00178,IGSF3:0.000485,SLC25A5:0.000485,ZNF717:0.000485"

## [1] "LV 546"
## [1] "ANP32B:0.00131,CDC27:0.000485,CTDSP2:0.000485,FAM104B:0.000485,IGSF3:0.000485,SLC25A5:0.000485,ZNF717:0.000485"

## [1] "LV 624"
## [1] "ANP32B:0.000105,CDC27:2.98e-05,CTBP2:0.00239,CTDSP2:2.98e-05,FAM104B:2.98e-05,GGT1:0.000188,IGSF3:2.98e-05,SLC25A5:2.98e-05,ZNF717:2.98e-05"

## [1] "LV 835"
## [1] "ANP32B:0.00034,CDC27:4.72e-05,CTBP2:0.00239,CTDSP2:4.72e-05,FAM104B:4.72e-05,GGT1:0.00027,IGSF3:4.72e-05,SLC25A5:4.72e-05,ZNF717:4.72e-05"

## [1] "LV 849"
## [1] "ANP32B:0.00945,CDC27:0.000112,CTBP2:0.00558,CTDSP2:0.000112,FAM104B:0.000112,GGT1:0.00526,IGSF3:0.000112,SLC25A5:0.000112,ZNF717:0.000112"

## [1] "LV 851"
## [1] "ANP32B:0.000487,CDC27:2.49e-06,CTBP2:3.5e-06,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:1.36e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"

## [1] "LV 984"
## [1] "ANP32B:0.000952,CDC27:2.49e-06,CTBP2:0.0032,CTDSP2:2.49e-06,FAM104B:2.49e-06,GGT1:1.36e-05,IGSF3:2.49e-06,SLC25A5:2.49e-06,ZNF717:2.49e-06"

## [1] "31,SVM B cells naive"
## [1] "CDC27:0.000676,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.000529,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"

## [1] "4,REACTOME_NEURONAL_SYSTEM"
## [1] "CDC27:0.000676,CTBP2:0.000953,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.00235,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"

## [1] "LV 229"
## [1] "CDC27:0.000676,CTDSP2:0.000676,FAM104B:0.000676,GGT1:0.000986,IGSF3:0.000676,SLC25A5:0.000676,ZNF717:0.000676"

## [1] "LV 644"
## [1] "CDC27:0.000167,CTBP2:0.000105,CTDSP2:0.000167,FAM104B:0.000167,IGSF3:0.000167,SLC25A5:0.000167,ZNF717:0.000167"

## [1] "LV 653"
## [1] "CDC27:0.00126,CTBP2:0.00729,CTDSP2:0.00126,FAM104B:0.00126,IGSF3:0.00126,SLC25A5:0.00126,ZNF717:0.00126"

## [1] "LV 665"
## [1] "CDC27:0.0017,CTDSP2:0.0017,FAM104B:0.0017,IGSF3:0.0017,SLC25A5:0.0017,ZNF717:0.0017"

## [1] "LV 72"
## [1] "CDC27:0.00301,CTDSP2:0.00301,FAM104B:0.00301,GGT1:0.00679,IGSF3:0.00301,SLC25A5:0.00301,ZNF717:0.00301"

## [1] "LV 864"
## [1] "CDC27:0.00126,CTDSP2:0.00126,FAM104B:0.00126,IGSF3:0.00126,SLC25A5:0.00126,ZNF717:0.00126"

#}

I’m not sure how to interpret this - it seems like most LVs have the same sets of genes that are mutated. Not sure why this is.

#this is a failed attempt to group by tumor type
#with.sig<-counts%>%ungroup()%>%subset(gene%in%top.genes$Hugo_Symbol)%>%
#    group_by(latent_var,tumorType,gene)%>%
#  mutate(pval=t.test(value~status)$p.value)%>%
#  ungroup()%>%
#  group_by(latent_var)%>%
#  mutate(corP=p.adjust(pval))%>%ungroup()%>%
#  select(latent_var,tumorType,gene,pval,corP)%>%distinct()

#sig.vals<-subset(with.sig,corP<0.05)

#DT::datatable(sig.vals)